Workshop #4

The AI Engines and Applications

"Tackling Challenges of Federated Learning for Next-Generation Smart Health Systems"

Prof. Guoliang XING (CUHK)

Abstract

Recent years have witnessed the emergence of new sensor, IoT and AI technologies, which together will revolutionize the landscape of medical research and treatment. In particular, running on off-the-shelf smart devices, these technologies can identify new physiological, behavioral, and cognitive biomarkers and facilitate early diagnosis and intervention. However, the prominence of smart devices leads to widespread concerns of privacy violation. Federated Learning (FL) has recently received significant interests thanks to its capability of protecting data privacy. However, existing FL paradigms yield unsatisfactory performance for large-scale real-world health applications such as human activity recognition since they are oblivious to the intrinsic relationship between data of different users. Moreover, they perform poorly on the long-tailed and heterogeneous data distribution, which is prominent in real-world environments.

In this talk, I will discuss our recent work on tracking challenges of FL for health systems. First, I will present ClusterFL, a clustering-based federated learning system that can provide high model accuracy and low communication overhead. ClusterFL features a novel clustered multi-task federated learning framework that can automatically capture the intrinsic clustering relationship among the users. Second, I will present FedDL, a novel FL system that can learn personalized models for different users dynamically through a dynamic layer sharing scheme. Lastly, I will describe BalanceFL, a federated learning framework that can robustly learn both common and rare classes from a long-tailed real-world dataset, addressing both the global and local data imbalance at the same time. In collaboration with CUHK medical school, our technologies will be deployed and validated through a large-scale clinical trial.

Prof. Guoliang XING

"Software and Hardware Co-Design for Multi-Tenant DNN Accelerator in the Cloud"

Prof. Yu WANG (THU)

Abstract

We have witnessed the rapid growth of Deep Neural Networks (DNNs) in the past decade. Deep neural network enabling technology has made a great impact in almost every field of our lives. However, the high computation and storage complexity of neural network inference poses great difficulty on its application. In the past seven years, both academia and industry have devoted a lot of efforts to design Domain Specific Accelerators (DSAs) for DNN applications, so as to achieve low power and high performance DNN inference acceleration.

INFerence-as-a-Service (INFaaS) has become a primary workload in the cloud. However, existing DNN accelerators are mainly optimized for the fastest speed of a single task, while the multi-tenancy of INFaaS has not been explored yet. As the demand for INFaaS keeps growing, simply increasing the number of DNN accelerators is not cost-effective, while merely sharing these single-task optimized DNN accelerators in a time-division multiplexing way could lead to poor isolation and heavy performance loss for INFaaS.

This talk will first give some basic ideas of software and hardware co-design methodology for DNN acceleration. Specifically, this talk will focus on FPGA based DNN accelerators. Secondly, this talk will move on to the cloud scenario, where the techniques and methodologies for enabling multi-tenancy on DNN accelerators will be presented and discussed.

Prof. Yu WANG

"Recent Advances in AI Acceleration"

Prof. Wayne LUK, (ICL)

Abstract

This talk presents recent advances in accelerating AI workloads. It begins with an overview of design techniques and parametric building blocks for accelerating a variety of AI operations, from neural networks to reinforcement learning. Next, an approach is described for speeding up constraint-based causal discovery by shifting performance bottlenecks. The capability of this approach is then illustrated by reducing the run time of the ‘pcalg’ software tool dealing with the DREAM5-Insilico gene expression dataset, from 79 hours on an 8-core Xeon processor to 8 minutes on an Arria-10 GX FPGA. Finally, some thoughts about future directions of AI acceleration will be provided.

Prof. Wayne LUK

Back